Sprint 2 — Rate limiting, leverage safety, Redis kill-switch by cluster2600 · Pull Request #17 · cluster2600/ELVIS

cluster2600 · 2026-02-18T06:44:42Z

Sprint 2: Reliability & Risk Management

Issues fixed

[SECURITY] S4 — No Binance API rate limiting (IP ban risk) #12 — Binance API rate limiting: retry decorator (tenacity, exponential backoff), auto-pause at 80% of 1200 req/min limit
[RISK] R1 — Default leverage 100x will cause instant liquidation #14 — Leverage safety: default changed from 100x → 3x; bot refuses to start with >10x unless OVERRIDE_HIGH_LEVERAGE=true
[RISK] R4 — No kill-switch / emergency stop mechanism #15 — Kill-switch Redis persistence: survives restarts; graceful fallback to in-memory if Redis unavailable

Commits

ff745bf9 [SECURITY] S4 — No Binance API rate limiting (IP ban risk) #12 Rate limiting + retry
2f329480 [RISK] R1 — Default leverage 100x will cause instant liquidation #14 Leverage safety
9ff9ddad [RISK] R4 — No kill-switch / emergency stop mechanism #15 Kill-switch Redis persistence

Testing

Import/logic checks passed locally
All env vars documented in .env.example

Next: Sprint 3

Paper trading validation (Binance testnet)
Live trading go/no-go decision

…N env var Hardcoded token 'trading-bot-token' exposed Vault credentials to anyone with repo read access. Now raises EnvironmentError if VAULT_TOKEN is not set, forcing operators to explicitly provide the secret at runtime.

…_PASSWORD env var Hardcoded password 'elvis_password' exposed database credentials in source. POSTGRES_CONFIG now reads all connection params from environment variables: DB_HOST, DB_PORT, DB_USER, DB_PASSWORD (required), DB_NAME Raises EnvironmentError at startup if DB_PASSWORD is not set.

The Flask API on 0.0.0.0:5050 had no authentication — any host on the network could read trade data. Added a before_request hook that validates the X-API-Key header against the API_KEY environment variable on every request except /health (exempted for load-balancer/Docker health checks). Returns HTTP 401 on invalid key, HTTP 503 if API_KEY is not configured.

Added three new endpoints to trading/utils/trade_history_api.py: POST /emergency_stop — Activate kill-switch (halts trading) DELETE /emergency_stop — Clear kill-switch (allow resume) GET /emergency_stop/status — Read current kill-switch state All endpoints are protected by the X-API-Key auth added in #13. The global KILL_SWITCH_ACTIVE flag must be polled by the trading engine in main.py to honour the stop signal. Activation is logged at CRITICAL level with timestamp and remote IP.

Added documentation for VAULT_TOKEN (#10), DB_PASSWORD (#11), and API_KEY (#13) with clear REQUIRED markers and usage notes.

Adds hyphen-variant routes (/emergency-stop, /emergency-stop/status) as aliases to the existing underscore endpoints, matching the sprint spec. Adds public is_trading_halted() helper so trading loops can import it: from trading.utils.trade_history_api import is_trading_halted if is_trading_halted(): break # stop all order placement All routes are protected by the X-API-Key auth from ISSUE #13. POST returns 200 with JSON confirmation; 409 if already stopped. DELETE clears the flag; GET /emergency-stop/status is read-only.

…ntial backoff - Add utils/binance_rate_limiter.py: * binance_retry decorator (tenacity): 5 attempts, exponential back-off 1-60 s * check_rate_limit_headers(): reads X-MBX-USED-WEIGHT-1M and X-MBX-ORDER-COUNT-1S headers; logs utilisation at DEBUG and auto-pauses (5 s default) when >80% of the 1200 req/min weight limit is consumed; honours Retry-After header * rate_limited_call() convenience wrapper for raw requests.Response callers * All thresholds configurable via env vars (BINANCE_RATE_LIMIT_WARN_FRACTION, BINANCE_WEIGHT_LIMIT_PER_MIN, BINANCE_RETRY_MULTIPLIER, etc.) - Wrap get_balance(), get_position(), get_current_price(), set_leverage() in binance_executor.py with @binance_retry inner functions - Add tenacity to requirements.txt

…idation - config/config.py: * DEFAULT_LEVERAGE changed from hardcoded 100 to 3 (read from DEFAULT_LEVERAGE env var, defaulting to 3 if unset) * Add validate_leverage_config(leverage) function: - leverage > 5x → WARNING log on every startup - leverage > 10x → EnvironmentError unless OVERRIDE_HIGH_LEVERAGE=true - leverage <= 0 → ValueError immediately - trading/execution/binance_executor.py: * __init__ default_leverage parameter changed from int=100 to int=None (resolves from TRADING_CONFIG['DEFAULT_LEVERAGE'] when not explicitly passed) * Calls validate_leverage_config() on startup — bot refuses to run with leverage > 10x unless operator has set OVERRIDE_HIGH_LEVERAGE=true - .env.example: document DEFAULT_LEVERAGE, OVERRIDE_HIGH_LEVERAGE, and all new Issue #12 / #15 env vars

…starts) - Replace purely in-memory kill-switch state with Redis-backed persistence: * On ACTIVATE → redis.set('ELVIS_KILL_SWITCH', '1') * On DEACTIVATE → redis.delete('ELVIS_KILL_SWITCH') * On STARTUP → reads Redis; if key is present, sets KILL_SWITCH_ACTIVE=True and logs a CRITICAL warning — trading stays halted until a human explicitly clears it via DELETE /emergency_stop - Add helper functions: * _get_redis_client() — connects using REDIS_HOST/PORT/DB/PASSWORD env vars * _redis_set_kill_switch() — writes/deletes key; returns bool (success) * _redis_get_kill_switch() — reads key; falls back to in-memory if Redis down - GET /emergency_stop/status now reads from Redis on every call and syncs the in-memory flag (response includes source: 'redis') - POST /emergency_stop and DELETE /emergency_stop responses now include redis_persisted / redis_updated fields to confirm persistence - Graceful fallback: if Redis is unavailable, behaviour is identical to Sprint 1 (in-memory only) and a WARNING is logged — bot does not crash - All Redis connection params controlled by env vars (no hardcoding)

claude · 2026-02-18T06:45:13Z

Claude finished @cluster2600's task —— View job

Code Review Complete ✅

I've completed a comprehensive review of Sprint 2's reliability and risk management features. Here's my detailed feedback:

🎯 Overall Assessment: STRONG

This PR successfully implements three critical reliability features with solid engineering practices. The code demonstrates defensive programming, proper error handling, and good documentation.

✅ Strengths

Issue #12: Rate Limiting Implementation

Excellent use of tenacity for exponential backoff retry logic
Smart rate limit detection via response headers (X-MBX-USED-WEIGHT-1M)
Configurable thresholds via environment variables
Automatic pause at 80% capacity to prevent 429 errors
Proper exception handling for different Binance SDK versions

Issue #14: Leverage Safety Validation

Well-designed validation function with clear warning thresholds:
- > 5x: Warning logs (good user awareness)
- > 10x: Requires explicit OVERRIDE_HIGH_LEVERAGE=true override
Default reduced from dangerous 100x → safer 3x leverage
Validation occurs at startup preventing accidental high-leverage trading
Clean separation of concerns in config.py:116-166

Issue #15: Redis Kill-Switch Persistence

Robust fallback strategy: Redis → in-memory if Redis unavailable
RESTful API design with both /emergency_stop and /emergency-stop endpoints
Proper state synchronization between Redis and memory
Comprehensive logging with audit trail (IP, timestamp)
Process restart survival achieved correctly

⚠️ Areas for Improvement

1. Security Concerns

API Authentication: While Issue [SECURITY] S6 — Flask API exposed on 0.0.0.0:5050 with no authentication #13 adds X-API-Key requirement, consider:
- Rate limiting per API key to prevent abuse
- API key rotation mechanism
- Audit logging for authentication failures

2. Error Handling Edge Cases

Rate Limiter: binance_rate_limiter.py:84-91 - ClientError status code extraction could fail if attribute doesn't exist

# Consider safer approach:
status = getattr(exc, "status_code", None) or getattr(exc, "code", None)

3. Configuration Validation

Environment Variables: Missing validation for numeric env vars in binance_rate_limiter.py:42-47
Consider adding bounds checking:

_WEIGHT_WARN_FRACTION = max(0.1, min(1.0, float(os.getenv("...", "0.80"))))

4. Testing Gaps

No visible unit tests for the new rate limiting logic
Kill-switch Redis persistence edge cases not covered
Leverage validation boundary conditions need testing

🚀 Performance Considerations

Positive Impact

Rate limiting prevents API ban/suspension (major reliability win)
Redis persistence adds minimal overhead (~1ms per operation)
Exponential backoff reduces server load during issues

Potential Concerns

Redis calls in critical trading path (trade_history_api.py:578-584)
Consider connection pooling for Redis client reuse
Rate limit pause (5s default) may affect high-frequency strategies

🔒 Security Analysis

Good Practices

Fail-closed API authentication (blocks when API_KEY not set)
No hardcoded secrets (removed from previous versions)
Proper input validation in kill-switch endpoints

Recommendations

Add request size limits to prevent DoS via large payloads
Consider implementing API request logging for security auditing
Redis connection should use authentication if running in production

📊 Code Quality

Excellent

Clear docstrings and inline comments
Consistent error handling patterns
Proper type hints in new functions
Environment variable configuration with sensible defaults

Minor Issues

main.py has grown quite large (1300+ lines) - consider splitting
Some magic numbers could be made configurable (trade_history_api.py:231)

🧪 Test Coverage Recommendations

Rate Limiter Tests:
- Test retry behavior with different exception types
- Verify exponential backoff timing
- Test header parsing edge cases
Leverage Validation Tests:
- Boundary conditions (exactly 5x, 10x)
- Invalid override values
- Missing environment variables
Kill-Switch Tests:
- Redis connection failures
- Concurrent activation/deactivation
- State persistence across restarts

🎉 Final Verdict

APPROVAL RECOMMENDED ✅

This PR significantly improves the reliability and safety of the trading system. The implementations are well-engineered with proper error handling and configuration options. The minor issues identified don't block deployment but should be addressed in future iterations.

Priority for follow-up:

Add comprehensive test suite
Split large files for maintainability
Add production Redis authentication
Implement API request auditing

Great work on implementing these critical safety features!

Sprint 2 — Rate limiting, leverage safety, Redis kill-switch

maxime added 9 commits February 17, 2026 22:27

docs: Update .env.example with all Sprint 1 required env variables

c8ff146

Added documentation for VAULT_TOKEN (#10), DB_PASSWORD (#11), and API_KEY (#13) with clear REQUIRED markers and usage notes.

cluster2600 merged commit 91af761 into main Feb 18, 2026
4 of 8 checks passed

cluster2600 added a commit that referenced this pull request Feb 18, 2026

Merge pull request #17 from cluster2600/sprint2/reliability-risk

672f102

Sprint 2 — Rate limiting, leverage safety, Redis kill-switch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Sprint 2 — Rate limiting, leverage safety, Redis kill-switch#17

Sprint 2 — Rate limiting, leverage safety, Redis kill-switch#17
cluster2600 merged 9 commits intomainfrom
sprint2/reliability-risk

cluster2600 commented Feb 18, 2026

Uh oh!

claude bot commented Feb 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

cluster2600 commented Feb 18, 2026

Sprint 2: Reliability & Risk Management

Issues fixed

Commits

Testing

Next: Sprint 3

Uh oh!

claude bot commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Complete ✅

🎯 Overall Assessment: STRONG

✅ Strengths

Issue #12: Rate Limiting Implementation

Issue #14: Leverage Safety Validation

Issue #15: Redis Kill-Switch Persistence

⚠️ Areas for Improvement

1. Security Concerns

2. Error Handling Edge Cases

3. Configuration Validation

4. Testing Gaps

🚀 Performance Considerations

Positive Impact

Potential Concerns

🔒 Security Analysis

Good Practices

Recommendations

📊 Code Quality

Excellent

Minor Issues

🧪 Test Coverage Recommendations

🎉 Final Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude bot commented Feb 18, 2026 •

edited

Loading